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Abstract 

O . Threshold effects in the estimation of parameters of non-linearly modulated, continuous- 

time, wide-band waveforms, are examined from a statistical physics perspective. These thresh- 
old effects are shown to be analogous to phase transitions of certain disordered physical systems 
in thermal equilibrium. The main message, in this work, is in demonstrating that this physical 
point of view may be insightful for understanding the interactions between two or more param- 
S^ ' eters to be estimated, from the aspects of the threshold effect. 
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1 Introduction 

In waveform communication systems, the information is normally conveyed in a real-valued param- 
eter (or parameters) of a continuous-time signal to be transmitted, whereas the receiver is based on 
estimating this parameter from a noisy received version of this signal |19|, Chap. 8]. This concept 
of mapping a real-valued parameter, or a parameter vector, into a continuous-time signal, using a 
certain modulation scheme, stands at the basis of the theory and practice of Shannon-Kotel'nikov 
mappings, which can in turn be viewed as certain families of joint source-channel codes (see, e.g., 
[6] , [7] , [9] , [H] as well as many references therein). 

When the underlying modulation scheme is highly non-linear, like in frequency modulation 
(FM), phase modulation (PM), pulse position modulation (PPM), or frequency position modulation 
(FPM), it is well known that the estimation of the desired parameter is subjected to a threshold 
effect. This threshold effect means that the wider is the bandwidth of the transmitted signal, 
the better is the accuracy of the maximum likelihood (ML) estimator at the high signal-to-noise 
ratio (SNR) regime, but on the other hand, it comes at the price of increasing also a certain 
critical level of the SNR, referred to as the threshold SNR, below which this estimator breaks 
down. This breakdown means that the estimator makes gross errors (a.k.a. anomalous errors) with 
an overwhelmingly large probability, and in the high bandwidth regime, this breakdown becomes 
abrupt, as the SNR crosses the threshold value. This threshold effect is not merely an artifact to be 
attributed to a specific modulator and/or estimation method. It is a fundamental limitation which 
is inherent to any (non-linear) communication system operating under a limited power constraint 
over a wide-band channel. 

In this paper, we propose a statistical-mechanical perspective on the threshold effect. According 
to this perspective, the abrupt threshold effect of the wide-band regime is viewed as a phase 
transition of a certain disordered physical system of interacting particles. Specifically, this physical 
system turns out to be closely related (though not quite identical) to a well-known model in the 
statistical physics literature, which is called the random energy model (REM). The REM is one 
model (among many other models) for highly disordered magnetic materials, called spin glasses. 
The REM was invented by Derrida in the early eighties of the previous century [3] , [4] , [5] , and it was 
shown more recently in |X3|, Chap. 6] (see also [TTJ) to be intimately related to phase transitions in 



the behavior of ensembles of random channel codes, not merely in the context of ordinary digital 
decoding, but also in minimum mean square error (MMSE) signal estimation |12j . 

This paper, in contrast to [12], examines the physics of the threshold effect in the estimation 
of a continuous-valued parameter, rather than the estimation of the signal itself. For the sake 
of simplicity and concreteness, the analogy between the threshold effect and phase transitions is 
demonstrated in the context of estimating the delay (or the position) of a narrow rectangular 
pulse, but the methodology is generalizable to other situations, as discussed in the sequel. A phase 
diagram with three phases (similarly as in [13] ) is obtained in the plane of two design parameters 
of the communication system, one pertaining to the signal bandwidth, and the other to a certain 
notion of temperature (which will be made clear in the sequel). 

Beyond the fact that this relationship, between the threshold effect in parameter estimation and 
phase transitions in physics, may be interesting on its own right, we also believe that the physical 
point of view may provide insights and tools for understanding the interactions and the collective 
behavior of the joint ML estimators of two or more parameters, in the context of the threshold effect. 
For example, suppose that both the amplitude and the delay of a narrow pulse are to be estimated. 
While the amplitude estimation alone does not exhibit any threshold effect (as the modulation is 
linear) and the delay estimation alone displays a phase diagram with three phases, it turns out that 
when joint ML estimation of both amplitude and delay is considered, the interaction between them 
exhibits a surprisingly more erratic behavior, than that of the delay parameter alone: It possesses 
as many as five different phases in the plane of bandwidth vs. temperature. Moreover, the behavior 
of the anomalous errors (below the threshold) pertaining to the amplitude and the delay are very 
different in character, and it is the physical point of view that gives rise to understanding them. 

The outline of this paper is as follows. In Section [21 we provide some basic background on the 
threshold effect in non-linear modulation and estimation. In Section El we present the threshold 
effect from the physics viewpoint and, in particular, we show how it is related to phase transitions 
pertaining to the REM. In Section [H we consider joint ML estimation of amplitude and delay, as 
described in the previous paragraph, and provide the phase diagram. Finally, in Section El we 
summarize and conclude this work. 



2 Background 

We begin with some basic background on ML parameter estimation for non-linearly modulated 
signals in additive white Gaussian noise (AWGN), the threshold effect pertaining to this estimation, 
and then the signal design problem, first, for band-limited signals, and then in large bandwidth 
limit. The material in this section, which is mostly classical and can be found in |191 Chap. 8], is 
briefly reviewed here merely for the sake of completeness and convenience of the reader. 

Consider the following estimation problem. We are given a parametric family of waveforms 
{s m (t), — T/2 < t < +T/2}, where m is the parameter, which for convenience, will be assumed 
a (deterministic) scalar that takes on values in some interval [—M,+M], (M > 0). Now suppose 
that we observe a noisy version of s m (t) along the time interval [— T/2, +T/2], i.e., 

r(t) = s m (t)+n(t), -^<£<+| (1) 

where {n(t)} is a zero-mean Gaussian white noise with spectral density No/2, and we wish to 

estimate m from r = {r(t), — T/2 < t < +T/2}. Maximum likelihood (ML) estimation, in the 

Gaussian case considered here, is obviously equivalent to the minimization of 

rT 

[r(t) - s m (t)] 2 dt (2) 

o 

w.r.t. to. The simplest example is the one where the parametrization of the signal is linear in to, 
i.e., s m (t) = to • s(t), where {s(t), — T/2 < t < +T/2} is a given waveform (independent of to). 
In this case, ML estimation yields 



Jo r(t)s(t)dt = Jl r(t)s (t)dt 
Io T s"(t)dt E 



™= J °,t ' . = ° T V ' , (3) 



where E designates the energy of {s(t)}, i.e., E = L s 2 (t)dt, and mean square error (MSE) is 
readily obtained as 

E{{m-mf} = ^. (4) 

The estimation performance depends on the signal {s(t)} only via its energy, E. Since this MSE 
achieves the Cramer-Rao lower bound, this is essentially the best one can do (at least as far as 
unbiased estimators go) with linear parametrization, for a given SNR E/Nq. 



The only way then to improve on this result, at least for very large SNR, is to extend the 
scope to non-linear parametrizations of {s m (t)}. For example, m can stand for the delay (or the 
position) of a given pulse s(t), i.e., s m (t) = s(t — m). Also, in the case of a sinusoidal waveform, 
s(t) = Asm(ojt + <fi) (with A, lo and <f> being fixed parameters), m can designate a frequency offset, 
as in s m {t) = .A sin [(a; + m)t + <fi], or a phase offset as in s m (t) = Asm(uit + <j) + m )- I n these 
examples, the MSE in the high SNR regime, depends not only on the SNR, E/Nq, but also on the 
shape of the waveform, i.e., on some notion of bandwidth: Rapidly varying signals can be estimated 
more accurately than slowly varying ones. To demonstrate this, let us assume that the noise is 
very weak, and the true parameter is m = uiq. For small deviations from mo, we consider the 
linearization 

s m (t) « s mo (t) + (m-m )s mo (t), (5) 

where s mo (t) = ds m (t)/dm\ m=mo . This is then essentially the same linear model as before where 
the previous role of |s(t)} is played now by {s mo (t)}, and so, the MSE is about 

E{{m-mf}^^ (6) 

where E is the energy of {s mo (t)}, which depends, of course, not only on E, but also on the shape 
of {s m (t)}. For example, if m is a delay parameter, s m (t) = s(t — m), and {s(t)} contains a narrow 
pulse (or pulses) compared to T, then E = L s 2 (t)dt, essentially independently of m, where s(t) 
is the time derivative of s(t). By the Parseval theorem, 

pT r+oo 

/ s 2 (t)dt = / d/(2vr/) 2 5(/), (7) 

JO J-oo 

where S(f) is the Fourier transform of {s(t)}, and so, we have E = W 2 E where W is the effective 
bandwidth of s(t) in the second moment sense, a.k.a. the Gabor bandwidth. We then have 

£{( ™-™ )2} "2W^ (8) 

which means that MSE depends, not only on E/Nq, but also on the signal shape - in this case, its 
Gabor bandwidth, W. One might be tempted to think that the larger is W, the better is the MSE. 
However, there is a price for increasing W: the probability of anomalous errors increases. 

To understand the effect of anomaly, it is instructive to look at the broader picture: Let us 
assume that the parametric family of signals {s m (t) : — M < m < +M} lies in the linear space 



spanned by a set of K orthonormal basis functions {4>i(t)}^ =1 , denned over —T/2 <t< +T/2, and 
so, we can pass from continuous time signals to vectors of coefficients: 

K 

s{t) = Y,Si{m)<Pi(t) (9) 

with 

Siim) = I s(t)<t>i(t)dt, (10) 

Jo 

and let us apply similar decompositions to r{t) and n(i), so as to obtain vectors of coefficients 
r = (n, . . . ,rx), and n = (m, . . . ,nx), related by 

r i = s i (m) + n i , i = l,2,...,K (11) 

where n, ~ Af(0, N /2), or 

r = s(m) + n. (12) 

As in the example of a delay parameter, let us assume that both the energy E of the signal {s m (t)} 
itself, and the energy E of its derivative w.r.t. m, {s m (t)}, are fixed, independently of m. In 
other words, ^- sf(m) = E and ^- s?(m) = E for all m. Consider the locus of the signal vectors 
[si(m), . . . , sk{iti)] in M asm varies from — M to +M. On the one hand, this locus is constrained 
to lie on the hyper-surface of an .ff -dimensional sphere of radius y/E, on the other hand, since the 
high-SNR MSE behaves according to Nq/(2E), we would like E = X/i^fC 771 ) t° be as large as 
possible. But E is related to the length L of the signal locus in IR according to 

-M 



dm l^sjim) = 2MVE, (13) 



where we have used the assumption that the norm of s(m) = (si(m), . . . , sxijn)) is independent 
of m. Thus, the high-SNR MSE is about 

E{(m-m) 2 }^— <L—, (14) 

which means that we would like to make the signal locus as long as possible, in order to minimize 
the high-SNR MSE. 

Our problem is then to design a signal locus, as long as possible, which lies in the hyper-surface 
of a .ftT-dimensional sphere of radius \[E. Since our room is limited by this energy constraint, a 



long locus would mean that it is very curvy, with many sharp foldings, and there must then be 
pairs of points mi and m%, which are far apart, yet s{m\) and 5(7712) are close in the Euclidean 
distance sense. In this case, if the noise vector n has a sufficiently large projection in the direction 
of s(rri2) — s(mi), it can cause a gross error, confusing mi with 777,2- Moreover, in high dimension K, 
there can be much more than one such problematic (orthogonal) direction in the above described 
sense and then the event of anomalous error, which is the event that the noise projection is large in 
at least one of these directions, gains an appreciably large probability Thus, as the locus of s(m) 
bends, various folds of the curve must be kept sufficiently far apart in all dimensions, so that the 
noise cannot cause anomalous errors with high probability. The probability of anomaly then sets the 
limit on the length of the curve, and hence also on the high SNR MSE. The maximum locus length L 
is shown in [19] to grow exponentially at the rate of e in the large T limit, where C is the capacity 
of the infinite-bandwidth AWGN channel, given by C = P/Nq, with P = E/T being the signal 
power. This maximum is essentially attained by the family frequency-position modulation (FPM) 
signals (see [19J), as well as by pulse-position modulation (PPM) signals, considered hereafter. 

As is shown in [191 Chap. 8], if the signal space is spanned by K ~ 2WT dimensions of signals of 
duration T and fixed bandwidth W, namely, K grows linearly with T for fixed W, the probability 
of anomaly is about K ■ e~ E/{2N °\ and so, the total MSE behaves (see [H eq. (8.100), p. 633]) 
roughly according to 

*H(™ " -) 2 i - 2^E + B ■ Ke - m2N0) > ( 15 ) 

where B > is some constant, the first term accounts for the high-SNR MSE, and the second 
term is the MSE dictated by the probability of an anomalous error. Note that here the degradation 
contributed by the anomalous error, as a function of Nq, is graceful, in other words, there is still 
no sharp breakdown of the kind that was described in the previous paragraph. This is because 
of the fact that as long as W is fixed, the K = 2WT orthonormal basis functions may capture 
only a very small fraction of the 'problematic directions' (as described in the previous paragraph) 
of the entire plethora of 'directions' of the noise, which is of infinite bandwidth. In other words, 
since the probability of a large noise projection in a certain direction is exponentially small, it 
takes exponentially many directions to make the probability of a large projection in at least one 
of them, considerably large. As the energies E and E, grow linearly with T (for fixed power and 
bandwidth), the first term in (115p is proportional to 1/T while the second term decays exponentially 



in T. A natural question that arises then is whether there may be better trade-offs. The answer 
is affirmative if W would be allowed to grow (exponentially fast) with T. Assuming then that 
W oc e for some fixed parameter R > 0, the first term would then decay at the rate of e~ 
whereas the second term may still continue to decay exponentially as long as R is not too large. 
The exact behavior depends, of course, on the form of the parametric family of signals {s m (t)}, 
but for some classes of signals like those pertaining to FPM, it is shown in [19] that the probability 
of anomaly decays according to e - TE ( R )^ where E(R) is the error exponent function pertaining 
to infinite-bandwidth orthogonal signals over the additive white Gaussian noise (AWGN) channel, 
i.e., 

£ <M?v^W ?<Lc (i6) 

Note that the best compromise between high-SNR MSE and anomalous MSE pertains to the 
solution to the equation E(R) = 2R, namely, R = C/6. For R > C, the probability of anomaly 
tends to 1 as T — > oo. Thus, we observe that in the regime of unlimited bandwidth, the threshold 
effect pertaining to anomalous errors is indeed sharp, while in the band-limited case, it is not. 

Our purpose, in this work, is to study the threshold effect of anomalous errors, in the unlimited 
bandwidth regime, from a physical point of view, by relating the threshold effect to phase transitions 
of large physical systems subjected to disorder, in particular, a REM-like model, as described in 
the Introduction. The limit of large T would then correspond to the thermodynamic limit of a large 
system, customarily considered in statistical physics. Moreover, as discussed earlier, the physical 
point of view will help us to understand situations where there is more than one phase transition. 

3 A Physical Perspective on the Threshold Effect 

For the sake of concreteness, we consider the case where the parameter m is time delay, defined in 



I 



units of Till Let then 

T T 1 

r(t) = s(t-mT) + n(t), < t < -\ — , - M < m < +M, M < -. (17) 

We will also assume that the signal autocorrelation function, i.e., 

A f +T/2 
R s {t)= / dts(t)s(t + T), (18) 

J-T/2 



1 More general situations will be discussed in the sequel. 



vanishes outside the interval [— A,+A]. In this case, it is natural to define the anomalous error 
event as the event where the absolute value of the estimation error, \rh — m\, exceeds A. Since 
the signal energy is E, then so is R s (0). Assuming that the signal support lies entirely within the 
interval [— T/2, +T/2] for all allowable values of m (i.e., M < ^ — A/T), the energy of {s(t — rnT)} is 
independent of m, and then maximum likelihood estimation is equivalent to maximum correlation: 

f+T/2 

rh = arg max / dtr(t)s(t — mT). (19) 

m: \m\<M J _t / 2 

If one treats m as a uniformly distributed random variable, the corresponding posterior density of 
m given {r(t), — T/2 < t < T/2} is given by 

exp {-^ S^[r(t) - s(t - mT)fdt) 



P(m\{r(t), -T/2<t<T/2}) 



Cm dm' exp { - ^ Ct'iI W) - «(* " rn'T^dt} 

ey ^\wJ-T/2 r {t) s {t ~ mT)dtj 
J^ dm' exp { ^ CVll ^(t ~ m ' T ) dt } 



(20) 



where in the second equality, we have cancelled out the factor exp{—j^J_ T L r 2 (t)dt}, which 
appears both in the numerator and the denominator, and we have used again the fact that the 
energy, E, of {s(t — mT)} is independent of m. Owing to the exponential form of this posterior 
distribution, it can be thought of, in the language of statistical mechanics, as the Boltzmann 
distribution with inverse temperature (3 = 2/N$ and Hamiltonian (i.e., energy as a function of m): 

n+T/2 

n(m) = - dtr(t)s(t - mT). (21) 

J-T/2 

This statistical-mechanical point of view suggests to expand the scope and define a family of 
probability distributions parametrized by /?, as follows: 

exp <^ j3 \_ T L r(t)s(t — mT)dt \ 

P $ (m\{r(t), -T/2<t<T/2}) = — l = ' * f (22) 

f^M dm' exp 1(3 f^j* r(t)s(t - m'T)dt\ 

There are at least three meaningful choices of the value of the parameter f3: The first is (3 = 0, 
corresponding to the uniform distribution on [— M, +M], which is the prior. The second choice is 
/3 = 2 /No, which corresponds to the true posterior distribution, as said. Finally, as (3 — > oo, the 
density Pg(-\{r(t), —T/2 < t < T/2}) puts more and more weight on the value of m that maximizes 
the correlation f_ T j 2 dtr(t)s(t — mT), namely, on the ML estimator rh. It should be emphasized 



that if we vary the parameter /?, this is not necessarily equivalent to a corresponding variation in 
the choice of Nq, according to (3 = 2/Nq. For example, one may examine the behavior of the ML 
estimator by letting (3 — > oo, but still analyze its performance for a given finite value of Nq. This 
is to say that Pp(-\{r(t), — T/2 < t < T/2}) should only be thought of as an auxiliary posterior 
density function, not as the real one. The denominator of Pg(m|{r(t), — T/2 <t< T/2}), namely, 

r+M ( r+T/2 ) 

C(/3) = / dmex.pt /3 r{t)s{t - mT)dt } (23) 

J-M [ J-T/2 J 

can then be thought of as the partition function pertaining to the Boltzmann distribution P / g(-|{r(t), - 
T/2<t< T/2}). 

Now, without essential loss of generality, let us assume that the true parameter value is m = 

0, that A divides 2MT, and that the integer K = 2MT/A is an even number. Consider the 

partition of the interval [— M, +M] of possible values of m into sub-intervals of size A/T. Let 

Mi = [iA/T, (i + l)A/T) denote the i-th sub-interval, i = -K/2, -K/2 + 1, ... -1,0, +1, ... , K/2. 

We will find it convenient to view the ML estimation of m as a two-step procedure, where one first 

maximizes the correlation J_ T L dtr(t)s(t — mT) within each sub-interval Mi, i.e., calculate 

r+T/2 
max / dtr(t)s(t-mT), (24) 

rneMi J_ T /2 

and then take the largest maximum over all i. Let us define 

r+T/2 r+T/2 

€q = max / dtr(t)s(t — mT) = max / dtr(t)s(t — mT) (25) 

\m\<A/T J-T/2 m&MoUM-i J_ T / 2 



and for i ^ 0, 



\m\<A/T J -T/2 m£MoUM-i J-T/2 

-+T/2 



r+ir* 
e { = max / dtr(t)s(t-mT), 1 < i < K/2 - 1 (26) 

m^Mi J-T/2 
f+T/2 

d = max / dtr(t)s(t - mT), - (K/2 - 1) < i < -1 (27) 

meM l -i J-T/2 

Thus, for the purpose of analyzing the behavior of the ML estimator, we can use a modified version 
of the partition function, defined as 

K/2-1 
i=-K/2+l 

and analyze it in the limit of (3 — > oo (the low temperature limit). Note that here, e, has the 
meaning of the (negative) Hamiltonian pertaining to a 'system configuration' indexed by i. 



10 



In order to characterize the behavior of Z(/3), it is instructive to recognize that it is quite 
similar to the random energy model (REM) of disordered spin glasses: According to the REM, the 
energies {e^}, pertaining to various system configurations indexed by i, are i.i.d. random variables, 
normally assumed zero-mean and Gaussian, but other distributions are possible too. This is not 
quite exactly our case, but as we shall see shortly, this is close enough to allow the techniques 
associated with the analysis of the REM to be applicable here. 

First, observe that under these assumptions, 

r+T/2 
eo = max / dtfs(t) + n(t)]s(t — mT) 

\m\<A/Tj_ T / 2 



\m\<A/Tj_ T/2 

r+T/2 
R s (mT) + dtn(t)s(t - mT) 

J-T/2 



max 

\m\<A/T 



(29) 



whereas for i / 0, 



and 



r+T/2 
ei = max / dtn(t)s(t — mT), i > 0, (30) 

meM z J-T/2 

f+T/2 
€i = max / dtn(t)s(t — mT), i < 0. (31) 

As for eo, we have, on the one hand 

r+T/2 r+T/2 

eo>R s (0)+ dtn(t)s(t) = PT + dtn(t)s{t) (32) 

J-T/2 J-T/2 

and on the other hand, 

r+T/2 r+T/2 

en < max R s (mT) + max / dtn(t)s(t — mT) = PT + max / dtn(t)s(t — mT). 

\m\<A/T \™\<A/tJ_ t /2 \m\<A/Tj_ T/2 

(33) 
Considering the limit T — $■ oo for fixed P, both the upper bound and the lower bound are dominated 
by the first term, which grows linearly with T, while the second term is a random variable whose 
standard deviation, for large T, scales in proportion to VT. Thus, for a typical realization of 
{n(t), —T/2 <t< T/2}, eo ~ PT, and so, its typical contribution to the partition function is given 
by 

Z (/3) i e^° « e? PT . (34) 

Consider now the contribution of all the other {e^} to the partition function, and define 

Z a ((3) = ^e^, (35) 
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where the subscript a stands for 'anomaly', as this term pertains to anomalous errors. The total 
partition function is, of course, 

Z(P) = Z (J3) + Z a (J3). (36) 

Now, for i / 0, {e{\ are identically distributed RV's, which are alternately independent, i.e., 
. . . , e_3,e_i,ei,63, . . . are independent (since the noise is white and R s (t) vanishes for |r| > A), 
and so are . . . , e_4, e_2,£2> e 4> • • •• I n order to evaluate the typical behavior of Z a ((3), we shall 
represent it as 

Z a ((3)= [deN(e)e^, (37) 



where N(e)de is the number of {e^} that fall between e and e + de, i.e., 

JV(e)de = ^Z(e< e* < e + de), (38) 

where X(-) is the indicator function of an event. Obviously, 

E{N(e)de} = K • Pr{e < e* < e + de} (39) 

and so, 

E{N(e)} = K ■ /(e), (40) 

where /(e) is the probability density function (pdf) of e.;, for i ^ 0. Now, to accommodate the 
asymptotic regime of W oc e RT , we take the signal duration to be A = Aoe _RT , where Ao > is a 

fixed parameter, and so, 

2MT RT 
K = —— ■ e RT . (41) 

Thus, A^(e)de is the sum of exponentially many binary random variables. As said earlier, although 

these random variables are not independent, they are alternately independent, and so, if iV(e)de is 

represented as 

Y^ Ae<ei<e + de)+ J^ X(e < e t < e + de) (42) 

j^o even i odd 

then each of the two terms is the sum of i.i.d. binary random variables, whose typical value is zero 
when y ■ /(e)de << 1 and E{N(e)de} when y • /(e)de >> 1. This means that, asymptotically, 
for large T, only energy levels for which In /(e) > — RT will typically be populated by some {i}. 
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Let et be the largest solution to the equation In /(e) > —RT. Then, the typical value of Z a is 
exponentially 

rex J 1 



Z a (p,R) I j--e RT f(e)e^de 

= exp i max [.RT + In /(e) + /3e] 

[e<e T 



(43) 



where = denotes asymptotic equality in the exponential scalqj as T — > oo, and where we have 
modified the notation from Z a (/3) to Z a {f3,R) to emphasize the dependence on the exponential 
growth rate R of the parameter K . Any further derivation, from this point onward, requires the 
knowledge of the pdf /(e), which is known accurately only for certain specific choices of the pulse 
shape. One of them, that we will assume here for concreteness, is the rectangular pulse 



s(t) 



E \t\ < A 
5 rl - 1 



elsewhere 

where E = PT, P being the average power of the signal. Therefore, 



(44) 



R s (t) = E 



1 



A 



PT 



1 



A 



(45) 



where [x]+ = max{0, x}. From a result by Slepian [16] in a form that was later derived by Shepp [15] 
(see also [20J ) , it is known that if Xq is a zero-mean Gaussian random process with autocorrelation 
function R(t) = [1 — |r|] + , then the cumulative probability distribution function of Y = sup <e<i Xg 
is given by 



—a?/2 —a 2 

F (o) = Pr{F < a} = [1 - $(a)] 2 _ ae [i _ $( Q )] ' 



2tt 



where 



*(o) 



A 1 



2vr J a 



e~ u2 / 2 du. 



This means that the density of Y is given by 



/o(«) 



dF (a 



ae 



da 2ir 

This result applies, in our case, to the random process 

-+T/2 
-T/2 



+ [1 -*(a)](l + a 2 )- 



-a 2 /2 



Xa 



N PT 



r+T/2 
/ dtn(t)s(t - 9A), O^tf^l, 

J-T/2 



(46) 

(47) 

(48) 
(49) 



2 For two non-negative functions a(T) and b(T), the notation a(T) — b(T) means that limT->oo ^t In fLrr = 
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which means that for i 7^ 0, the probability density function of e, is given by 



« £ )=»/jvb?f-A Yjwm r (50) 



Thus, 

Z o( A*)=exp |^+iln(^) + max 



ln/o - +/3e 



(51) 



Now, the exact form of /o may not lend itself to convenient analysis, but considering the asymptotic 
limit of T —7- 00, it is not difficult to see (due to the scaling by \J ' NqPT/2 in the argument of /o( - )) 
that the maximum at the exponent of the last expression is attained for values of e that grow 
without bound asT-> 00. It would therefore be convenient to approximate fo(a) given above by 
its dominant term for very large a, which is given by 

2 ~a 2 /2 

AW"^75T- <52> 



On substituting this approximation, we first find an approximation to Et according to 

2 In (-^=) - -^- = -RT. (53) 

For large T, the first term is negligible compared to the second term and the right-hand side, and 
so, et is well approximated as 

E T = \/N PR ■ T. (54) 



Next, we use the approximate form of /o in the maximization of In /o(e/ \JNqPT/2) + Be, i.e., solve 

the problem 

f / 2e 2 \ e 2 1 

max In ( — „„ ) — — „„ + Be (55) 



e<VN PRT 
whose maximizer, for large T, is easily found to be approximated by 



e* = min J ^N PR ■ T, 



NqPTJ NqPT 
~oximat< 

f3N PT \ 



2 J ' 

On substituting this back into the expression of Z a (B, R), and defining 



(56) 



we get 



MP,R)= Jim lnZa ^ R \ (57) 



^<8 m-{ R+^ P<Pc(R) (58) 

w^-i b^jwr p>p c { R ) (58) 
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where 



Mm = y§, (5 9 ) 



C = P/Nq being the capacity of infinite-bandwidth AWGN channel. Thus, we see that Z a (ft,R) 
undergoes a phase transition at f3 = (3 C (R): For /3 < /3 C (R), Z a {j3, R) is dominated by an exponential 
number of {«} for which e« is about /3NqPT/2. As /? exceeds f3 c (R), the system pertaining to Z a 
undergoes a phase transition, where Z a {f3,R) becomes dominated by a sub-exponential number of 
{i} at the 'ground state' level of \/NqPR ■ T. This sub-exponential number of dominant ground- 
state 'configurations' corresponds to a zero entropy, yet disordered phase, which is called in the 
terminology of physicists, the glassy phase (see [13} Chap. 5]). 

Taking now into account the contribution of Zo(f3), and defining 

mR)= Umh^. t ( 60 ) 

we end up with three phases, as can be seen in the following expression 

r PP {R< P(fi - /3 2 iV /4), p < 2/N } \J{R < C, (3 > 2/N } 

V>(/3, R) = \ R +f55 {R > C, < p c (R)} \J{P(/3 - /3 2 iV /4) < R < C, /3 < 2/N } (61) 
{ f3^N PR elsewhere 

The phase diagram is depicted in Fig. [TJ As said earlier, for ML estimation, the relevant regime 
is j3 — > oo, where as can be seen, the system undergoes a phase transition as R exceeds C. This 
phase transition captures the threshold effect in the estimation of the delay parameter m, in this 
example. 

As long as R < C, the probability of anomaly is still vanishingly small, and the dominant event 
is that of a small error (less than A/T in absolute value). The critical point, where all three phases 
meet, is the point (C, 2/Nq). Note that /3 = 2/Nq is the 'natural' value of /3 that arises in the true 
posterior of m given {r(t), — T/2 < t < T/2}. 

As we can see, the physical perspective provides some insight, not only concerning the estimation 
of the parameter m, but moreover, about the posterior of m given the noisy signal {r(t), — T/2 < 
t < T/2}. If we use the 'correct' value of j3 or larger i.e., j3 > 2/Nq, then as long as R < C, the 
posterior possesses a very sharp peak around the true value of m and the width of this peak does 
not exceed A/T from either side. This is the ordered phase, or the ferromagnetic phase, in the 
jargon of physicists. As R crosses C, then the behavior is as follows: If /3 = 2/Nq, the posterior 
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R = C 



glassy 



13 = MR) 




C = P/N 



Figure 1: Phase diagram for ML estimation of a delay parameter. 

changes abruptly and instead of one peak around the true m, it becomes dominated by exponentially 
many 'spikes' scattered across the whole interval [—M,+M]. This is the paramagnetic phase. If, 
on the other hand, (3 > 2 /No, then there is an intermediate range of rates R £ [C,(3 2 NqP/A], 
where the number of such spikes is still sub-exponential, which means the glassy phase. Finally, 
as one continues to increase R above /3 2 A r o-P/4, the number of spikes becomes exponential (the 
paramagnetic phase). 

On the other hand, for j3 < 2/ No, the abrupt transition to exponentially many spikes happens 
for R = P((3 — /3 2 A r o/4), which is less than C. The fixed bandwidth regime corresponds to the 
vertical axis (R = 0) in the phase diagram, and as can be seen, no phase transition occurs along 
this axis at any finite temperature. This is in agreement with our earlier discussion on the graceful 
behavior of the probability of anomaly at fixed bandwidth. 

It is instructive to compare the behavior of the ML estimator to the Weiss-Weinstein lower 
bound [18], [IT] because this bound is claimed to capture the threshold effect. As we have seen, 
the ML estimator has the following ranges of exponential behavior as a function of R: 

~ 2RT R < C/6 



E{{m-mf}~ { e ~ E ^ T C/6<R<C 
3 -°' T R>C 



(62) 



On the other hand, the Weiss-Weinstein bound (WWB) for estimating a rectangular pulse in 
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Gaussian white noise is given (in our notation) by 

WW* fr 2 [l-Vr]^exp{-[/VA]+CT/2} 

WWB = max — — j=- - — - — , . -, , ^^ , ,, , (63) 

h>o 2[l-(l-2h/T) + exp{-[h/A]+CT/2}] v ; 

where [x]+ = max{x, 0} and [x} + = min{x, 1}. Examining this bound under the asymptotic regime 

of T — > oo with A = Aoe - ^ 7 ", yields the following behavior: 

WWB ~ <^ _ CT/2 ' (64) 



e~ CT / 2 R>C/A 

In agreement with the analysis in [17], we readily observe that for a given R and for high SNR 
(C = P/Nq —t- oo), both quantities are of the exponential order of e~ 2RT , whereas for low SNR 
(C — )• 0), both are about e~ CT ' 2 ~ e~°' T . However, if we look at both quantities as functions of R 
for fixed C > 0, there is a different behavior. Not only the phase transition points differ, but also 
the large R asymptotics disagree. Thus, the WWB indeed captures the threshold effect of the ML 
estimator, but in a slightly weaker sense when it comes to the asymptotic wide-band regime. 

Discussing Some Extensions 

It is interesting to slightly expand the scope to a situation of mismatched estimation. Suppose that 

instead of ML estimation based on the known waveform s(t), the estimator is based on maximizing 

the temporal correlation with another waveform, s(t — mT), whose energy is E = PT and whose 

width is A = Aoe~ . In this case, the phase diagram, in the plane of /3 vs. R, will remain 

essentially the same as in Fig. Q] except that there will be a degradation by a factor of p in /3, and 

by a factor of p 2 in the rate, where 

A 1 f +T l 2 
P=r=l s(t)S(t)dt (65) 

E J-T/2 

In other words, the triple point will be (p 2 C,2p/No), the vertical straight-line ferromagnetic- 
glassy phase boundary will be R = p 2 C, rather than R = C. The other phase boundaries will be as 
follows: the paramagnetic-ferromagnetic boundary is the parabola R = P(pf3 — /3 2 -/V"o/4), and the 
paramagnetic-glassy boundary would continue to be the parabola f3 = /3 C (R), where the function 
/S c (") is as defined before. The dependence on the parameter R of the real signal is solely via its 
effect on the parameter p. 

Our derivations above are somewhat specific to the example of time delay estimation, and for 
the special case of a rectangular pulse. Therefore, a few words about the more general picture 
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are in order. First, consider time delay estimation of more general signals. We assumed that 
R s (t) vanishes for |r| > A, but this still leaves room for more general pulses with support A, not 
necessarily the rectangular one. Unfortunately, as said earlier, the exact pdf of e,, i / 0, is not 
known for a general autocorrelation function that is induced by a general choice of s(t). However, 
for our asymptotic analysis in the regime of T — > oo, what counts (as we have seen) is actually 
merely the tail behavior of this pdf, and this tail is known, under fairly general conditions (see [U 
p. 40], with a reference also to |10j). to behave the same way as the tail of the Gaussian pdf of 
zero mean and variance NqPT/2. Therefore, our approximate analysis in the large T limit would 
continue to apply for other pulse shapes as well. 

Second, consider the estimation of parameters other than delay (e.g., frequency offset or phase), 
still requiring that the time correlation between s m (t) and s m /(t) would essentially vanish whenever 
\m — m'\ exceeds a certain threshold (in our earlier example, A/T). In this case, as we have 
seen, the high-SNR MSE is inversely proportional to the squared norm, E, of the vector s(m) 
of derivatives of {si(m)} w.r.t. m. Again, assuming that this norm is independent of to, it is 
proportional to the length of the signal locus, as discussed earlier. For a good trade-off between 
the high-SNR MSE and the anomalous MSE, we would like to modulate the parameter in such a 
way that for a given E, the quantity E would grow exponentially with T, i.e., E oc e , as an 
extension of our earlier discussion in the case of a time delay. For example, in the case of frequency- 
position modulation, where s(t) = Acos(2ir(f c + mW)t + </>), \m\ < M, W « f c , both f c and 
W should be proportional to e . The corresponding analysis of e, and the associated partition 
function would be, in principle, similarly as before, except that one should consider the process 
Xq = f_ T /2 n (*) cos (27r(/c + 0W)t + 4>)dt, and the remarks of the previous paragraph continue to 
apply. Similar comments apply to other kinds of parametrization. 

4 Joint ML Estimation of Amplitude and Delay 

We now extend our earlier study to the model 

T T 

r(t) = a-s(t-mT) + n(t), < t < +- (66) 
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where now both a and m are parameters to be estimated, and where it is assumed that m G 
[-M, +M] as before and a G [a m ; n , a m ax], with < a m \ n < 1 < a max and 



^max C^min J ct m ; n 



a da = 1 



(67) 



which means that the average energy (w.r.t. the uniform distribution within the interval [a m ; n , a max ]) 
of the received signal is still E. Here the energy of the received signal depends on a, as it is given 
by a 2 E. The relevant partition function would be 



Z(P,R) 



da ^2 exp[/3(ae; - a 2 PT/2)} = I daZ(a, (3, R). 



(68) 



The analysis of Z a (a, (3,R) (which is the same expression except that the sum excludes i = 0) in 
the framework of a REM-like model, is precisely the same as before except that f3 is replaced by 
/3a and there is another multiplicative factor of exp{— f3a 2 PT/2}. Accordingly, re-defining 

In Z a (a,/3,R) 



ip a (a,/3,R) = lim 

T— >QO 



we get the following results: For f3 < /3 c (-R)/a max , 



il> a (a,p,R) = R + 



(3a 2 P 



(PNq 



Vamin < a < a r 



Similarly, (3 > /3 c (R)/a Tj 



i> a (a,l3,R)=p(ay/N PR 



a 2 P 



Vamin < a < a r 



(69) 



(70) 



(71) 



Finally, for j3 G (/3 c (i?)/a max ,/3 c (i?)/a min ) we have: 



/3a 2 P 



R 



ip a (a,P,R) 



R + ^(/3iV - 2) a min < a < -A-JZ 



PN Q V c 



p(a^PR-°?P) 2/R< a <a ri 



(72) 



2 ; 0n o y c 

Upon maximizing over a, we get five different phases of ip a (P, R) = max a ip a (ot, /?, R), three glassy 
phases and two paramagnetic ones: 

- 2 ,P\ x. , 9 „ , n . (UK) 



^cs.fl) 



/3 [a min Vl%PR ~ ^f-) R < a 2 niQ C and /3 > ^ 
» R G (a min C, aL x C) and > ^ 



/3 (a max Vl%PR ~ ^f^) R > «maxC and /3 > g§ 
i? + ^Z (/?7Vo _ 2) ' /3<min{«^} 



A'o 



(73) 



R+ ^^ ipNo 



2) jR>a 2 iaxCand/3G / 7 | ) |M 
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IX UL ■ \_; IX (JL rn „ -. v_y 






/3 



fe(fl) 






2 
max v 



^„09, i?) = /3(a max V^VoPi? - a max P/2) 
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R 



Figure 2: Phase diagram of Z a ((3, R) for joint ML estimation of amplitude and delay. 

In Figure El we show the phase diagram of ip a (/3,R). As can be seen, the paramagnetic phase is 
split into the two sub-phases, according to /3 < 2/Nq and j3 > 2/Nq, whereas the glassy phase is 
split into three parts, according to the range of R. 

Finally, when we take into account the contribution of Zq{j3) = e P pT / 2 ^ where it is assumed that 
that true values of the parameters are a® = 1 and tuq = 0, we end up with the following expression 
for the re-defined 

In 7,(R R\ 

(74) 






which is given by 



W,R) 



where 



2 

2 



{R < C and p > ^} U {R < Rfi and < ^} 



R € (C, a4 ax C) and /3 > '' 



(3(a max ^PR 



R + /3a \- p (/3N { 



No 



R + ^f£((3N 



(, - 2) 
-2) 



R > a 2 max C and /3 > |M 

fl >aL x C and /3g(^,M§ 
R> R/3 and fi < jj- 



(75) 



R 



A P 



/3 



No 



p\L -\- Ct m ; n J 



(76) 



The phase diagram of this function is depicted in Fig. [3j 
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Discussion 

Although the model is linear in the parameter a, its interaction with m exhibits, in general, more 
phases than the parameter m alone, and it causes anomalies in the estimation of a as well, but 
these anomalies have a different character than those associated with m: While the anomaly makes 
the estimator of m become an essentially uniformly distributed random variable within the interval 
[-M, +M], the anomalous estimator of a tends to concentrate on a deterministic value as T — > oo. 
To see why this is true, observe that in the limit of large (3 (which is relevant for ML estimation), as 
long as R < C, the estimation error is typically not anomalous. For C < R < a^ ax C, the dominant 



value of a is y/R/C, whereas for R > a^^C, the dominant value of a is a max - For low /3, we also 
identify the region where the posterior of (a,m) is dominated by points where a = a m j n . 

Referring to Fig. O in the special case where a max = oo, the eastern glassy phase and the 
northern paramagnetic phase disappear, and we end up with three phases only: the ordered phase 
(unaltered), the southern paramagnetic phase, and the western glassy phase. If, in addition, a m \ n = 
(i.e., we know nothing a-priori on a), then the curve R = Rp becomes a straight line (R = /3P/2) 
and in the paramagnetic region, we get tp(J3, R) = R. On the other hand, the case a m i n = a max = 1 
(i.e., a = 1 and there is no uncertainty in a), we are back to the earlier case of a delay parameter 
only. 

5 Summary and Conclusion 

In this paper, we proposed a statistical-mechanical perspective on the threshold effect in parameter 
estimation of non-linearly modulated wide-band signals corrupted by additive white Gaussian noise. 
The proposed framework, which is mapped into a REM-like model of disordered spin glasses, 
provides a fairly comprehensive picture of the behavior of the ML estimator as a function of the 
bandwidth parameter R and the temperature parameter (3. We then extended the scope to joint 
ML estimator of two parameters. 

The concepts and the techniques exercised in this paper are believed to generalize to other signal 
models, as well as to joint ML estimation of more than two parameters. The proposed approach 
may therefore serve as a yardstick for gaining insights and understanding concerning the threshold 
behavior in more complicated situations, including models which are expected to exhibit more 
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R — C R — o max C 



*KI3,R) = e£ 



ordered 



C 

to 

II 



V>(/3, R) = /?(Q max 7iVoAR - qLx^P/2) 

, a _ MR) 
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>(/3, i?) = R + ^%^ (/3iV - 2) 



^fl = -2- 
p N 
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^(P,R) = R+^f^(/3N -2) 



: [/3(l + «Ln)-/? 2 ^Vo«Ln/2] 



fl 



Figure 3: Phase diagram of Z(j3,R) for joint ML estimation of amplitude and delay. 

than one threshold with respect to the SNR (which means more than one phase transition in the 
analogous physical model). For example, models of superimposed signals, where each component 
signal has its own threshold SNR, or combinations of threshold effects due to non-linearity (as 
studied here) with threshold effects that stem from ambiguity. The latter is characteristic, for 
example, when the delay of a narrow-band signal is to be estimated (see, e.g., |17j). 
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